Background

Executed Offenders Data

This is a storyboard project using death-row data from the Texas Department of Criminal Justice, or TDCJ.

TDCJ keeps keeps records of every inmate they execute.

In this project, I scrape the data from the HTML table, which contains some demographic information on each inmate. I use the Selector Gadget to locate and then build urls to the info and last statement for each inmate.

With some help from the purrr package, I was able to download all of the .jpg images from the website. We perform some image manipulation with the magick package.


Required Packages

HTML web-scraping packages

library(rvest)     
library(jsonlite)  
library(xopen)     
library(xml2)

Data wrangling/processing packages

library(dplyr)
library(tidyr)
library(janitor)
library(forcats)
library(lubridate)
library(stringr)
library(scales)
library(fs)

Graphs

library(paletteer)
library(ggthemes)
library(patchwork)

Shiny/dashboard packages

library(knitr)
library(flexdashboard)
library(gtsummary)

Image manipulation/processing

library(grid)
library(magick)
library(ggpubr)
library(jpeg)
library(paletteer)
library(ggthemes)
library(patchwork)
library(ggrepel)

Import

Scraping the table

Read HTML from URL

Load the xml2 package and define the url with the data (here it’s webpage_url).

webpage_url <- "http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html"
webpage <- xml2::read_html(webpage_url)

Exract HTML tables

Use the rvest::html_table() to find the table in the webpage object. This is at position [[1]].

ExOffndrsRaw <- rvest::html_table(webpage)[[1]] 

The dplyr::glimpse(40) function is helpful here.

# check the data.frame
ExOffndrsRaw |> dplyr::glimpse(60)
Rows: 576
Columns: 10
$ Execution    <int> 576, 575, 574, 573, 572, 571, 570, 56…
$ Link         <chr> "Inmate Information", "Inmate Informa…
$ Link         <chr> "Last Statement", "Last Statement", "…
$ `Last Name`  <chr> "Ramirez", "Chanthakoummane", "Buntio…
$ `First Name` <chr> "John", "Kosol", "Carl", "Rick", "Joh…
$ TDCJNumber   <int> 999544, 999529, 993, 999049, 999567, …
$ Age          <int> 38, 41, 78, 57, 45, 41, 45, 47, 64, 4…
$ Date         <chr> "10/5/2022", "8/17/2022", "4/21/2022"…
$ Race         <chr> "Hispanic", "Other", "White", "White"…
$ County       <chr> "Nueces", "Collin", "Harris", "Harris…

Fix the column names

We can see the Link column is repeated, which is going to be a problem when we put these data into their own tibble because R doesn’t like to repeat the column names inside a data.frame.

We will use the tibble::as_tibble() function, but add the .name_repair = "unique" argument. The .name_repair argument has other options ("check_unique", "unique", "universal" and "minimal"), and you can read the help files using ?as_tibble.

In this case, "unique" will work just fine.

ExecutedOffenders <- rvest::html_table(webpage)[[1]] |> 
  # repair the repeated columns
  tibble::as_tibble(.name_repair = "unique") |> 
  # get unique names
  janitor::clean_names(case = "snake") |> 
  # lower, snake case
  dplyr::rename(offender_info = link_2, 
                # rename these 
                last_statement = link_3)
ExecutedOffenders |> glimpse(60)
Rows: 576
Columns: 10
$ execution      <int> 576, 575, 574, 573, 572, 571, 570, …
$ offender_info  <chr> "Inmate Information", "Inmate Infor…
$ last_statement <chr> "Last Statement", "Last Statement",…
$ last_name      <chr> "Ramirez", "Chanthakoummane", "Bunt…
$ first_name     <chr> "John", "Kosol", "Carl", "Rick", "J…
$ tdcj_number    <int> 999544, 999529, 993, 999049, 999567…
$ age            <int> 38, 41, 78, 57, 45, 41, 45, 47, 64,…
$ date           <chr> "10/5/2022", "8/17/2022", "4/21/202…
$ race           <chr> "Hispanic", "Other", "White", "Whit…
$ county         <chr> "Nueces", "Collin", "Harris", "Harr…

The executed offenders table:

Wrangle

Identify the links with selector gadget

In order to get the nodes from the table, we need to send webpage through a few passes of rvest functions (html_nodes and html_attr) with various css tags to get the correct URL paths. This took a few tries and some trial and error, but eventually I was able to figure out the the correct combinations to get the Links to the pages.

Links <- webpage |> 
  # this get the links in the overflow table 
  # row
  rvest::html_nodes(".overflow tr") |> 
  # the links
  rvest::html_nodes("a") |> 
  # the header ref
  rvest::html_attr("href")
# check Links
Links |> utils::head(20)
 [1] "dr_info/ramirezjohn.html"             
 [2] "dr_info/ramirezjohnlast.html"         
 [3] "dr_info/chanthakoummanekosoul.html"   
 [4] "dr_info/chanthakoummanekosollast.html"
 [5] "dr_info/buntionc.jpg"                 
 [6] "dr_info/buntioncarllast.html"         
 [7] "dr_info/rhoadesrick.html"             
 [8] "dr_info/rhoadesricklast.html"         
 [9] "dr_info/hummeljohn.html"              
[10] "dr_info/hummeljohnlast.html"          
[11] "dr_info/jonesquintin.html"            
[12] "dr_info/jonesquintinlast.html"        
[13] "dr_info/wardlowbilly.html"            
[14] "dr_info/wardlowbillylast.html"        
[15] "dr_info/ochoaabel.html"               
[16] "dr_info/ochoaabellast.html"           
[17] "dr_info/gardnerjohn.html"             
[18] "dr_info/gardnerjohnlast.html"         
[19] "dr_info/runnelstravis.html"           
[20] "dr_info/runnelstravislast.html"       

Now Links contain:

  1. A dr_info/ path (which makes the entire path "http://www.tdcj.state.tx.us/death_row/dr_info/").

  2. Every offender has two links–one with their full name, the other with a last string attached to the back of their full name.

Something tells me if I check the base::length() of Links with the base::nrow()s in ExOffndrs…there will be twice as many links as rows in executed offenders.

length(Links)
[1] 1152
nrow(ExecutedOffenders)
[1] 576

Good–this is what I want. That means each row in ExecutedOffenders has two links associated with their name.

Create last_links statements

The stringr package can help me wrangle this long vector into the last_pattern logical vector, which I then use to subset the Links.

last_pattern <- stringr::str_detect(
                            string = Links, 
                            pattern = "last")
utils::head(Links[last_pattern])
[1] "dr_info/ramirezjohnlast.html"         
[2] "dr_info/chanthakoummanekosollast.html"
[3] "dr_info/buntioncarllast.html"         
[4] "dr_info/rhoadesricklast.html"         
[5] "dr_info/hummeljohnlast.html"          
[6] "dr_info/jonesquintinlast.html"        

Check to see that Links[last_pattern] is same length as the number of rows in ExecutedOffenders

base::identical(
  x = base::length(Links[last_pattern]), 
  y = base::nrow(ExecutedOffenders))
[1] TRUE

Great–subset the Links for the last_pattern, then give this vector a name (last_links).

last_links <- Links[last_pattern]
last_links |> utils::head(10)
 [1] "dr_info/ramirezjohnlast.html"         
 [2] "dr_info/chanthakoummanekosollast.html"
 [3] "dr_info/buntioncarllast.html"         
 [4] "dr_info/rhoadesricklast.html"         
 [5] "dr_info/hummeljohnlast.html"          
 [6] "dr_info/jonesquintinlast.html"        
 [7] "dr_info/wardlowbillylast.html"        
 [8] "dr_info/ochoaabellast.html"           
 [9] "dr_info/gardnerjohnlast.html"         
[10] "dr_info/runnelstravislast.html"       

If I check the length of items in last_links, I can see there are an identical number of rows in the data frame.

base::identical(
  x = base::length(last_links),
  y = base::nrow(ExecutedOffenders))
[1] TRUE

Assign the last_url column to ExecutedOffenders

This means I can easily assign these as a new column in ExecutedOffenders.

ExecutedOffenders |> glimpse(40)
Rows: 576
Columns: 10
$ execution      <int> 576, 575, 574, …
$ offender_info  <chr> "Inmate Informa…
$ last_statement <chr> "Last Statement…
$ last_name      <chr> "Ramirez", "Cha…
$ first_name     <chr> "John", "Kosol"…
$ tdcj_number    <int> 999544, 999529,…
$ age            <int> 38, 41, 78, 57,…
$ date           <chr> "10/5/2022", "8…
$ race           <chr> "Hispanic", "Ot…
$ county         <chr> "Nueces", "Coll…

Not done yet–I need to add the beginning of the web address:

https://www.tdcj.texas.gov/death_row/

# test 
ExecutedOffenders |> 
  dplyr::mutate(
    last_url = 
        paste0("https://www.tdcj.texas.gov/death_row/", 
                                  last_links)) |> 
  dplyr::pull(last_url) |> 
  utils::head(10) |> 
  base::writeLines()
https://www.tdcj.texas.gov/death_row/dr_info/ramirezjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/chanthakoummanekosollast.html
https://www.tdcj.texas.gov/death_row/dr_info/buntioncarllast.html
https://www.tdcj.texas.gov/death_row/dr_info/rhoadesricklast.html
https://www.tdcj.texas.gov/death_row/dr_info/hummeljohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/jonesquintinlast.html
https://www.tdcj.texas.gov/death_row/dr_info/wardlowbillylast.html
https://www.tdcj.texas.gov/death_row/dr_info/ochoaabellast.html
https://www.tdcj.texas.gov/death_row/dr_info/gardnerjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/runnelstravislast.html
# assign
ExecutedOffenders <- ExecutedOffenders |> 
  dplyr::mutate(
    last_url = 
        paste0("https://www.tdcj.texas.gov/death_row/", 
                                  last_links))

Now we will tidy these up into nice, clean LastUrl tibble.

LastUrl <- last_links |> 
  
  tibble::as_tibble(.name_repair = 
                          "unique") |> 
  
  tidyr::gather(key = "key",
                value = "value") |> 
  
  dplyr::select(name_last_url = value) |> 
  
  dplyr::mutate(name_last_url = 
        paste0("https://www.tdcj.texas.gov/death_row/", 
                                  last_links))

LastUrl$name_last_url |> 
  utils::head() |> 
  base::writeLines()
https://www.tdcj.texas.gov/death_row/dr_info/ramirezjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/chanthakoummanekosollast.html
https://www.tdcj.texas.gov/death_row/dr_info/buntioncarllast.html
https://www.tdcj.texas.gov/death_row/dr_info/rhoadesricklast.html
https://www.tdcj.texas.gov/death_row/dr_info/hummeljohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/jonesquintinlast.html

Test one of the URLs out in the browser.

xopen("https://www.tdcj.texas.gov/death_row/dr_info/swearingenlarrylast.html")

Create the info pattern

Now I want the offender information links (so I omit the links with last in the pattern).

info_pattern <- !stringr::str_detect(
                            string = Links, 
                            pattern = "last")
Links[info_pattern] |> 
  utils::head() |> 
  base::writeLines()
dr_info/ramirezjohn.html
dr_info/chanthakoummanekosoul.html
dr_info/buntionc.jpg
dr_info/rhoadesrick.html
dr_info/hummeljohn.html
dr_info/jonesquintin.html

Verify length and rows

Check the base::length() to see if it’s identical to the number of rows in ExecutedOffenders.

base::identical(x = base::length(Links[info_pattern]), 
                y = base::nrow(ExecutedOffenders))
[1] TRUE

Great!

Check the length() of info_links

info_links <- Links[info_pattern]
base::identical(x = base::length(info_links),
                y = base::nrow(ExecutedOffenders))
[1] TRUE

These are also identical. Repeat the URL process from above on the info_url

Create info_url column

Now we combine this with the https://www.tdcj.texas.gov/death_row/ URL.

ExecutedOffenders |> 
  dplyr::mutate(
    info_url = 
        paste0("https://www.tdcj.texas.gov/death_row/", 
                                  info_links)) |> 
  dplyr::pull(last_url) |> 
  utils::head(10) |> 
  base::writeLines()
https://www.tdcj.texas.gov/death_row/dr_info/ramirezjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/chanthakoummanekosollast.html
https://www.tdcj.texas.gov/death_row/dr_info/buntioncarllast.html
https://www.tdcj.texas.gov/death_row/dr_info/rhoadesricklast.html
https://www.tdcj.texas.gov/death_row/dr_info/hummeljohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/jonesquintinlast.html
https://www.tdcj.texas.gov/death_row/dr_info/wardlowbillylast.html
https://www.tdcj.texas.gov/death_row/dr_info/ochoaabellast.html
https://www.tdcj.texas.gov/death_row/dr_info/gardnerjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/runnelstravislast.html
# assign
ExecutedOffenders <- ExecutedOffenders |> 
  dplyr::mutate(
    info_url = 
        paste0("http://www.tdcj.state.tx.us/death_row/", 
                                  info_links))

These are complete URLs–assign this to ExecutedOffenders data frame. Put the InfoLinks into a tidy data frame.

info_links <- Links[info_pattern]

InfoLinks <- info_links |> 
  # turn into a tibble
  tibble::as_tibble(.name_repair = "unique") |> 
  # tidy
  tidyr::gather(key = "key",
                value = "value") |> 
  # rename the value
  dplyr::select(dr_info_url = value) |> 
  # create the new url with death row root
  dplyr::mutate(
    dr_info_url = paste0(
                    "http://www.tdcj.state.tx.us/death_row/", 
                    info_links))
InfoLinks |> dplyr::glimpse(60)
Rows: 576
Columns: 1
$ dr_info_url <chr> "http://www.tdcj.state.tx.us/death_row…

Check in browser

Test a few of these out in the browser:

xopen("http://www.tdcj.state.tx.us/death_row/dr_info/brookscharlie.html")

Now we assign these links to the ExecutedOffenders data frame. But first make sure they match up.

ExecutedOffenders |> 
  dplyr::select(last_name, 
                first_name) |>
  utils::head(10)
# A tibble: 10 × 2
   last_name       first_name
   <chr>           <chr>     
 1 Ramirez         John      
 2 Chanthakoummane Kosol     
 3 Buntion         Carl      
 4 Rhoades         Rick      
 5 Hummel          John      
 6 Jones           Quintin   
 7 Wardlow         Billy     
 8 Ochoa           Abel      
 9 Gardner         John      
10 Runnels         Travis    
ExecutedOffenders |> 
  dplyr::select(last_name, 
                first_name) |>
  utils::tail(10)
# A tibble: 10 × 2
   last_name   first_name
   <chr>       <chr>     
 1 Rumbaugh    Charles   
 2 Porter      Henry     
 3 Milton      Charles   
 4 De La Rosa  Jesse     
 5 Morin       Stephen   
 6 Skillern    Doyle     
 7 Barefoot    Thomas    
 8 O'Bryan     Ronald    
 9 Autry       James     
10 Brooks, Jr. Charlie   

Bind columns

Combine the ExecutedOffenders, LastUrl and InfoLinks.

# Use `dplyr::bind_cols()` to attach these columns 
# to `ExecutedOffenders` and rename to`ExOffndrsComplete`
ExecutedOffenders <- ExecutedOffenders |> 
  # add the info_url
  dplyr::bind_cols(LastUrl) |>
  # add the
  dplyr::bind_cols(InfoLinks) |>
  # move the names to the front
  dplyr::select(dplyr::ends_with("name"),
                # all else
                dplyr::everything())
ExecutedOffenders |> dplyr::glimpse(60)
Rows: 576
Columns: 14
$ last_name      <chr> "Ramirez", "Chanthakoummane", "Bunt…
$ first_name     <chr> "John", "Kosol", "Carl", "Rick", "J…
$ execution      <int> 576, 575, 574, 573, 572, 571, 570, …
$ offender_info  <chr> "Inmate Information", "Inmate Infor…
$ last_statement <chr> "Last Statement", "Last Statement",…
$ tdcj_number    <int> 999544, 999529, 993, 999049, 999567…
$ age            <int> 38, 41, 78, 57, 45, 41, 45, 47, 64,…
$ date           <chr> "10/5/2022", "8/17/2022", "4/21/202…
$ race           <chr> "Hispanic", "Other", "White", "Whit…
$ county         <chr> "Nueces", "Collin", "Harris", "Harr…
$ last_url       <chr> "https://www.tdcj.texas.gov/death_r…
$ info_url       <chr> "http://www.tdcj.state.tx.us/death_…
$ name_last_url  <chr> "https://www.tdcj.texas.gov/death_r…
$ dr_info_url    <chr> "http://www.tdcj.state.tx.us/death_…

Create indicator for .html vs .jpgs

Create a binary variable to identify if this is a .jpg or .html path and name the new data frame ExOffndrsComplete.

Use case_when() to create jpg_html

ExOffndrsComplete <- ExecutedOffenders |> 
  dplyr::mutate(jpg_html = 
    dplyr::case_when(
      str_detect(string = info_url, pattern = ".jpg") ~ "jpg", 
      str_detect(string = info_url, pattern = ".html") ~ "html")) 
ExOffndrsComplete |> dplyr::count(jpg_html)
# A tibble: 2 × 2
  jpg_html     n
  <chr>    <int>
1 html       195
2 jpg        381

Selector Gadget

Download the selector gadget app for your browser. You can identify the various elements in a webpage using the selector gadget. Read this tutorial to see how it works.

Use the selector gadget to locate the css tags for overflow, a

Last statement url

Check the last statement for inmate Carl Buntion, #993

Info url

Check out the info url inmate Kosoul Chanthakoummane, #999529

Downloading images

Now that we’ve downloaded the table of executed offenders from the Texas Department of Criminal Justice website and rebuilt the URLs to their last statements and info, we are going to download the mugshots that are stored as .jpgs.

ExOffndrsComplete |> dplyr::glimpse(60)
Rows: 576
Columns: 15
$ last_name      <chr> "Ramirez", "Chanthakoummane", "Bunt…
$ first_name     <chr> "John", "Kosol", "Carl", "Rick", "J…
$ execution      <int> 576, 575, 574, 573, 572, 571, 570, …
$ offender_info  <chr> "Inmate Information", "Inmate Infor…
$ last_statement <chr> "Last Statement", "Last Statement",…
$ tdcj_number    <int> 999544, 999529, 993, 999049, 999567…
$ age            <int> 38, 41, 78, 57, 45, 41, 45, 47, 64,…
$ date           <chr> "10/5/2022", "8/17/2022", "4/21/202…
$ race           <chr> "Hispanic", "Other", "White", "Whit…
$ county         <chr> "Nueces", "Collin", "Harris", "Harr…
$ last_url       <chr> "https://www.tdcj.texas.gov/death_r…
$ info_url       <chr> "http://www.tdcj.state.tx.us/death_…
$ name_last_url  <chr> "https://www.tdcj.texas.gov/death_r…
$ dr_info_url    <chr> "http://www.tdcj.state.tx.us/death_…
$ jpg_html       <chr> "html", "html", "jpg", "html", "htm…

purrr and iteration

We will use purrrs iteration tools to download the images attached to the website profiles. Follow the three purrr steps from the workshop by Charlotte Wickham. See the accompanying slides, too.

We’ll go over these steps below:

purrr 1) Do ‘it’ for one element

First we need a test image (test_image) from the jpg_html column:

test_image <- ExOffndrsComplete |> 
  # only jpg row
  dplyr::filter(jpg_html == "jpg") |> 
  # pull the info url column
  dplyr::select(info_url) |> 
  # sample 1
  dplyr::sample_n(size = 1) |> 
  # convert to character 
  base::as.character() 

We can test the new url columns in the ExecOffenders with the magick::image_read() function.

# pass test_image to image_read()
magick::image_read(test_image)

You should see an image in the RStudio viewer pane

2. Turn ‘it’ into a recipe

Use dplyr::filter to select only the .jpgs in the ExOffndrsComplete and create ExOffndrsCompleteJpgs.

Pull the urls into a vector (jpg_url), then create a folder to download them into (jpg_path).

ExOffndrsCompleteJpgs <- ExOffndrsComplete |> 
  dplyr::filter(jpg_html == "jpg") 
jpg_url <- ExOffndrsCompleteJpgs$info_url
fs::dir_create("inst/extdata/jpgs/")
jpg_path <- paste0("inst/extdata/jpgs/", base::basename(jpg_url))
jpg_path |> utils::head()

3. Use purrr::purrr::walk2() to download all files

Now use the purrr::walk2() function to download the files. How does walk2 work?

First look at the arguments for utils::download.file().

?utils::download.file

How to walk2()

The help files tell us the walk2 function is “specialized for the two argument case”. So .x and .y become the two arguments we need to iterate over download.file(). We will walk through this step-by-step below:

  1. .x = the file path, which we created with the selector gadget above (in jpg_url)

  2. .y = the location we want the files to end up (jpg_path), and

  3. the function we want to iterate over .x and .y (download.file).

When we pass everything to purrr::walk2, R will go to the URL, download the file located at the URL, and put it in the associated jpgs/ folder.

Download .jpg files

Execute the code below and you will see the .jpgs downloading into the jpg folder.

purrr::walk2(.x = jpg_url, 
             .y = jpg_path, 
             .f = download.file)

You should see the following in your console.

You will see text similar to the content below.

# trying URL 'http://www.tdcj.state.tx.us/death_row/dr_info/robisonlarry.jpg'
# Content type 'image/jpeg' length 108341 bytes (105 KB)
# ==================================================
# downloaded 105 KB
# 
# trying URL 'http://www.tdcj.state.tx.us/death_row/dr_info/hicksdavid.jpg'
# Content type 'image/jpeg' length 139150 bytes (135 KB)
# ==================================================
# downloaded 135 KB

This might take awhile, but when its done, check the number of files in this folder.

fs::dir_info("./inst/extdata/jpgs/") |> 
  tibble::as_tibble() |> 
  dplyr::arrange(desc(size)) |> 
  dplyr::select(path, type, size) |> 
  dplyr::glimpse(60)
# Rows: 381
# Columns: 3
# $ path <fs::path> "./inst/extdata/jpgs/riddlegranville.jpg…
# $ type <fct> file, file, file, file, file, file, file, fil…
# $ size <fs::bytes> 244K, 225K, 202K, 195K, 190K, 189K, 189…

There you have it! 381 images of downloaded offenders!



magick::image_read()

You should see the .jpg file in the RStudio Viewer pane after running magick::image_read()



purrr::walk2()

When downloading the .jpg files with purrr::walk2(), you should see this in your console:

Data Description

Executions by age and race


Executions by age and race

Characteristic N = 5761
age 39 (33, 45)
race
    Black 206 (36%)
    Hispanic 110 (19%)
    Other 3 (0.5%)
    White 257 (45%)
1 Median (IQR); n (%)

Executions by date and month



Executions by date and month

Characteristic N = 5761
date 1982-12-07 to 2022-10-05
month
    Jan 61 (11%)
    Feb 45 (7.8%)
    Mar 49 (8.5%)
    Apr 43 (7.5%)
    May 59 (10%)
    Jun 58 (10%)
    Jul 34 (5.9%)
    Aug 52 (9.0%)
    Sep 52 (9.0%)
    Oct 39 (6.8%)
    Nov 50 (8.7%)
    Dec 34 (5.9%)
1 Range; n (%)

Visualize

Total executions

Total executions by race

Executions by race and year

Peak Executions by Year and Race

All

Images

Magick

I used the magik package for processing and manipulating the mugshot images. I advise checking out the entire vignette for more examples.

Create test image

The executed offender we’ll include in the plot is Billy Joe Woods from Waller County, Texas. He was executed on April 14th, 1997. The LA Times covered this case in an article titled, Executions in Texas: No Big Deal.

At this rate, with 12 executions already, 1997 is expected to become the busiest year ever on Texas’ death row–or any other state’s, for that matter–surpassing the previous record of 19, set here in 1995. If and when that happens–most likely next month–it could very well reignite the debate over capital punishment across the nation, inspiring activists on both sides of the issue alternately to hail and condemn the milestone.

I create test_magick_img from magick::image_read(), and then go on making the transformations as necessary.

"https://www.tdcj.texas.gov/death_row/dr_info/woodsbillyjoe.jpg" -> test_image
test_magick_img <- magick::image_read(test_image)

To remove the text and focus on the mugshot. This might need to be adjusted slightly for each new test_magick_img.

# crop this image
crop_750_1000_10 <- magick::image_crop(
  image = test_magick_img,
  geometry = "750x1000+10"
)

This trims the extra space off the bottom of the image.

Rotate with magick::image_rotate()

I want to rotate this image by 90 degrees.

# rotate this image
crop_750_1000_10_rotate90 <- magick::image_rotate(
  crop_750_1000_10,
  degrees = 90
)

Remove the rest of the text and focus on the mugshot. This might need to be adjusted slightly for each new image

# crop this image
crop_850_950_60 <- magick::image_crop(
  image = crop_750_1000_10_rotate90,
  geometry = "850x950+60"
)

Rotate this image back to center (image_rotate again) and flip it using magick::image_flip()

# rotate this image
crop_850_950_60_rotate270 <- magick::image_rotate(
  crop_850_950_60,
  degrees = 270)
# rotate this image
crop_850_950_60_flip <- magick::image_flip(crop_850_950_60_rotate270)

Crop the rest of the text out of the image, trim the white-space for the plot, and flip the image.

# crop this image
crop_750_255_300 <- magick::image_crop(
  image = crop_850_950_60_flip,
  geometry = "750x255+300"
)
# flip this image again
crop_850_950_60_flip <- magick::image_flip(crop_750_255_300)

Crop out the last little bit of the document.

# crop the dot out
crop_435_352_1 <- magick::image_crop(
  image = crop_850_950_60_flip,
  geometry = "435x352+1"
)

Image effects with magick

We will use magick::image_trim() to clean the image up a bit.

# Here we will trim the image up a bit with the `fuzz` argument
test_magick_clean <- magick::image_trim(
  image = crop_435_352_1,
  fuzz = 1
)

Now that I have all the trimming on and cropping done, I will add some effects for the ggplot2 image.

I want the image to have a bit more contrast, so I will use magick::image_contrast() to create these effects.

test_magick_final <- magick::image_contrast(image = test_magick_clean, 
  sharpen = TRUE 
)

Export .jpg

Export the image as a .jpg

magick::image_write(test_magick_final,
        path = "www/test_magick_final.jpg")

If we print test_magick_img to the console, the images comes up in the viewer pane.

Original image

Final image

Graph with image

I create TotalExecYear as the data for the image to appear on top of.

TotalExecYear <- ExOffndrsComplete |>
  dplyr::count(year, sort = TRUE)
head(TotalExecYear)
# A tibble: 6 × 2
   year     n
  <dbl> <int>
1  2000    40
2  1997    37
3  1999    35
4  2002    33
5  2007    26
6  2003    24

Import .jpg

library(jpeg)
imgJPEG <- jpeg::readJPEG("www/test_magick_final.jpg")

Build labels

labs_bjw <- labs(
  title = "Billy Joe Woods was the twelfth execution in Texas in 1997",
  subtitle = "1997 is expected to become the busiest year ever on Texas’ death row",
  caption = "https://www.latimes.com/archives/la-xpm-1997-05-20-mn-60637-story.html",
  x = "Year",
  y = "Total Executions"
)

Build a base plot

ggp2_bjw_base <- ggplot(
  data = TotalExecYear, 
  aes(x = factor(year),
    y = n)) 

Create background_image() with ggpubr

ggp2_bjw_bg_img <- ggpubr::background_image(imgJPEG) 

Create column graph

ggp2_bjw_col <- geom_col(aes(fill = n),
  position = "stack",
  show.legend = FALSE) 

Build x scale

ggp2_bjw_x_discrete <- scale_x_discrete(
    breaks = c("1980","1985", "1990", "1995",
    "2000", "2005", "2010", "2015", "2020"))

Set colors with paletteer

ggp2_bjw_fill_pal <- paletteer::scale_fill_paletteer_c("ggthemes::Red") 

Add labels using ggrepel

We also want to label the specific point on the graph where Billy Joe Woods was executed. We can do this by filtering the ExecOffenders to only his row in that dataset, then creating a label variable with paste0(). We now want to limit the columns in ExecOffLabel to only those values on the graph, which are year, n, and label

ExecOffLabel <- ExOffndrsComplete %>% 
  filter(last_name == "Woods" & first_name == "Billy") %>%
  mutate(
    year = factor(year),
    label = paste0(last_name, ", ", first_name, ": executed on ", date),
    n = 12) %>% 
  select(year, n, label)
ExecOffLabel
# A tibble: 1 × 3
  year      n label                               
  <fct> <dbl> <chr>                               
1 1997     12 Woods, Billy: executed on 1997-04-14

We will add the geom_label_repel() layer with ExecOffLabelData and increase the size of the label so it stands out.

ggp2_bjw_lab_repel <- ggrepel::geom_label_repel(
  data = ExecOffLabel,
      aes(x = year,
          y = n,
          label = label),
          # set color and size...
          color = "black",
          size = 2) 

Build plot

ggp2_bjw_base + 
  ggp2_bjw_bg_img + 
  ggp2_bjw_col + 
  ggp2_bjw_x_discrete +
  ggp2_bjw_fill_pal +
  labs_bjw +
  theme_minimal()

Add labels

ggp2_bjw_base + 
  ggp2_bjw_bg_img + 
  ggp2_bjw_col + 
  ggp2_bjw_lab_repel +
  ggp2_bjw_x_discrete +
  ggp2_bjw_fill_pal +
  labs_bjw +
  ggpubr::theme_pubclean()

Summary

Row

Key findings

Overall

Themes

Discussion

Row

Future work

Applications

Acknowledgements